-
Notifications
You must be signed in to change notification settings - Fork 292
Check that there are no changes during SR.scan #6413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check that there are no changes during SR.scan #6413
Conversation
For information we had the issue in integration during a live migration. The destination VDI has been created and introduced in the database after the SR.scan returns the list of VDIs and before the returned list is used by XAPI. The VDI has been forgotten and then the live migration task tried to use it.... and so it failed with invalid UUID. |
We'd really appreciate if someone at XS could run some live migration xenrt suites with this fix. It worked just fine for us but I'm wondering if there's some possible situation where it'd cause frequent retries during live migration. Thank you! |
ocaml/xapi/xapi_sr.ml
Outdated
let refs2 = List.map fst db_vdi2 in | ||
|
||
(* VDIs that are in db_vdi1 but not in db_vdi2 *) | ||
let vdis_removed = Listext.List.set_difference refs1 refs2 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these lists short? set_difference
is O(n^2). When nothing is added and nothing is removed, why is this not just set equality, i.e., sorted lists are equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be thousands of VDIs in a system, definitely lists should not be used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of set equality/sorted lists are equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code to use Set instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just check for equality between two sets now instead of two set differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact I wanted to add (and I just pushed it now) some debug messages. Because as Pau said there can be hundreds of VDIs so it can helps to show the ones that have been added or removed. And probably I can remove the debug message that write the list before and after. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes set equality looks good now. Because if during the scan a VDI has been added and removed sets are not empty but we don't need to start a new scan because the list of VDIs is ok. So yes I'm using equality now.
316c7a3
to
c207cb4
Compare
6c88eb5
to
8b7a7f1
Compare
8b7a7f1
to
bfcb401
Compare
|> String.concat "," | ||
) | ||
limit ; | ||
debug "%s detected db change while scanning, retry limit left %d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we report the VDIs that have been added or removed in vdis_ref_equal
I think it is ok to remove the dump of the two lists (before and after).
bfcb401
to
ef461b2
Compare
So the last version reports the Refs of VDI that has been removed or added. I agree that with the Ref we probably be able in the logs to find the corresponding UUID that makes more sense for the user. |
ef461b2
to
1d7a5b7
Compare
I think it's starting to look pretty good 😅 . Thanks for the feedback!!! |
@Vincent-lau if you can have a quick look again because it has been modified since your review ;) |
1d7a5b7
to
7276654
Compare
7276654
to
eb9b01f
Compare
Currently, we are only checking that no VDIs have been removed during the SR scan performed by the SM plugin. However, there are situations where a VDI has been added, and if this VDI is not present in the list obtained from SR.scan, it will be forgotten. The checks only prevent this in the case where the VDI was added during the scan. There is still a TOCTOU situation if the issue happens after the scan, and there is room for that. Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
eb9b01f
to
c927979
Compare
I added a debug message when the diff is clean and thus when the update of VDIs can start. |
Just for your information: this patch fixes the issue we had in CI. We've already tested it, but since the version was modified, I can confirm that this updated version behaves the same. |
I've kicked Toolstack Smoke and Verification tests with a build with these changes: 215889 These contain both intra and cross pool migrations to exercise this codepath to ensure there aren't any obvious regressions |
Intra-pool and cross-pool migration tests passed. There was an error elsewhere, but it's a testing issue. |
Currently, we are only checking that no VDIs have been removed during the SR scan performed by the SM plugin. However, there are situations where a VDI has been added, and if this VDI is not present in the list obtained from SR.scan, it will be forgotten. The checks only prevent this in the case where the VDI was added during the scan. There is still a TOCTOU situation if the issue happens after the scan, and there is room for that.